R is a software environment for statistical computing and graphics. Using R you can do rigorous statistical analysis, clean and manipulate data, and create publication-quality graphics.
clustering map
Packages are programs that you import into R to help make tasks easier. The most popular R packages for working with data include dplyr, stringr, tidyr, and ggplot2.
There’s no easy way (yet) for new R users to find R packages that they might need. People are working on this problem. In the meantime, consult the following list or ask a Librarian!
Resources include:
You can create graphs in R without installing a package, but packages will allow you to create better visualizations that are any of the following:
ggplot2 is the most popular visualization package for R. It’s the best all-purpose package for creating many types of 2-dimensional visualizations.
Source:
ggplot2 was created on the principles of the Layered Grammar of Graphics (2010), by Hadley Wickham and based of off work from Wilkinson, Anand, & Grossman (2005) and Jaques Bertin (1983).
Essentially: graphs are like sentences you can construct, and they have a grammar. The grammar of graphics consists of the following:
at least one layer:
scale
coordinate system
facet (optional)
These components make up a graph.
Open RStudio. Download the following file: script.R File > Open File…
Select the script.R file that you just downloaded
Click Open
Let’s see an example of a simple graph created with ggplot. We are going to use the mpg data set about different cars and their properties.
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p
## 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p
## 4 audi a4 2.0 2008 4 auto(av) f 21 30 p
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p
## # ... with 1 more variables: class <chr>
The graph below uses ggplot2 to look for correlation between a car’s engine displacement and highway mileage.
library(ggplot2): loads the ggplot2 library
ggplot() : function that tells R that you want to make a graph with ggplot
data = mpg : says that you want to use the mpg dataset (sample data that comes with R)
geom_point(): function that says you want to make a scatterplot
mapping = aes(): function that allows you to map data variables to X and Y axes
**Run the following code in your script file:**
Make a scatterplot with cyl mapped to the x-axis and hwy mapped to the y-axis.
Make a scatterplot of disp=x and hwy=y with class mapped to the color aesthetic. Run:
The type of drive system the car has (4-wheel, rear-wheel, and front-wheel) is mapped to color.
Variables can be mapped to the following aesthetic parameters. If you are publishing in b/w, and can’t use color, you might want to use size or shape:
colorsizeshapealpha - transparencySubstitute another aesthetic in place of color. Run the code:
Facets are a way to create multiple smaller charts, or subplots, based on a variable. Run this code to see what faceting does:
Substitute class for another variable in the dataset. Ex: trans, drive, or cyl